CiberAMP requires several packages that will be automatically installed:
r Biocpkg("TCGAbiolinks"), for TCGA data downloading, filtering, normalization and differential expression analysis (DEA).r Biocpkg("SummarizedExperiment"), to manage RNAseq experiments data.r Biocpkg("EDASeq"), to normalize RNAseq counts.r Biocpkg("edgeR"), to perform differential expression analysis (DEA) between two groups of samples.r Biocpkg("RTCGAToolbox"), to download latest GISTIC2.0 tun thresholded-by-file data from Broad's Firehouse data server.r Biocpkg("dplyr") & r Biocpkg("stringr"), to manage strings in R.r Biocpkg("ggplot2") & r Biocpkg("plotly") & r Biocpkg("shiny"), to plot static (ggplot2) and interactive (plotly) graphs with the results.The algorithm requires two mandatory inputs. The first one is a list of gene symbols to analyze. This can range from a single transcript to the whole set of human genes.
genes.of.interest <- c("GENEA", "GENEB", "GENEC", ...)
In this example, we will study which are genes are differentially expressed in association with their SCNVs in the cohort of head-and-neck and lung squamous cell carcinomas (TCGA-HNSC and TCGA-LUSC, respectively). In particular, we will focus our analysis on genes lodge at chromosomes 3, 5 and 8.
The first step is to create a vector with the symbols of the genes that are located at chromosomes 3, 5 and 8:
library(biomaRt) library(stringr)
# First, we get all the genes at chromosomes 3, 5 and 8 using the biomaRt R package. If you have the last version of R, the you can set "TRUE" the "useCache" argument in getBM function. ensembl <- useEnsembl(biomart = "ensembl", dataset = "hsapiens_gene_ensembl", mirror = "useast") genes <- getBM(attributes=c('chromosome_name', 'band', 'start_position', 'end_position', 'strand', 'hgnc_symbol'), filters = "chromosome_name", values = c("3", "5", "8"), mart=ensembl, useCache = FALSE) genes <- genes[genes$hgnc_symbol != "", ] genes <- genes[!duplicated(genes$hgnc_symbol), ] for(i in 1:nrow(genes)) { if(genes$strand[i] < 0) { genes$strand[i] <- "-" }else{ genes$strand[i] <- "+" } }
Now, we have (1) the list of gene symbols to query and (2) the TCGA cohorts to analyze (TCGA-HNSC and TCGA-LUSC). CiberAMP also allows you to designate a path to save your results. By default, it will use the current working directory.
library(ciberAMP) # x <- ciberAMP(genes = reg.genes$hgnc_symbol, cohorts = c("LUSC", "HNSC"), pat.percentage = 0, *writePath = "PATH_TO_FOLDER"*)
However, there are many arguments that allow you to personalize your analysis:
TCGAanalyze_Preprocessing.gcContent or geneLength (default).quantile (default), varFilter, filter1, filter2.IQR. See genefilter documentation for available methods.filt.var.funct.filter1. Defaults to 0.05.Deep, Shallow or BothNULL.NULL.In this example, we will run the analysis with default parameters:
x <- ciberAMP(genes = as.character(genes$hgnc_symbol), cohorts = c("HNSC", "LUSC"))
CiberAMP results into a list of 3 data frames that can be accessed by:
The x[[1]] data frame contains all differentially expressed genes between (1) tumor vs normal and (2) copy number altered vs diploid tumor samples:
#To get the first queried gene (first row) results. head(x[[1]][1, ])
The x[[2]] data frame containing CGC list differential expression results. Columns are the same as x[[1]].
head(x[[2]][1, ])
Finally, the x[[3]] data frame contains SCNVs co-occurrency between the SCNV-DEGs and known cancer driver genes amplification and deletions:
head(x[[3]][1, ])
From the original 1870 genes we found that:
They are mapped at their chromosomal positions:
library(chromPlot) df <- x[[1]] genes_cosmic <- x[[3]] gr.assoc.hnsc <- gr[gr$hgnc_symbol %in% as.character(df[df$Tumor %in% "HNSC" & abs(df$log2FC.SCNAvsDip) >= 1 & df$FDR.SCNAvsDip <= 0.01, ]$Gene_Symbol), ] gr.assoc.lusc <- gr[gr$hgnc_symbol %in% as.character(df[df$Tumor %in% "LUSC" & abs(df$log2FC.SCNAvsDip) >= 1 & df$FDR.SCNAvsDip <= 0.01, ]$Gene_Symbol), ] both <- intersect(gr.assoc.hnsc$hgnc_symbol, gr.assoc.lusc$hgnc_symbol) chromPlot(gaps=hg_gap, annot1 = gr[gr$hgnc_symbol %in% both, ], chr = c(3,5,8))
To answer if these genes are very close to any SCN-altered CGC in the cohort, we visually plot them (red bars) and the CGC co-occurring ones (yellow bars):
library(chromPlot) df.cosmic <- x[[2]] gr.assoc.hnsc.cosmic <- gr[gr$hgnc_symbol %in% as.character(df.cosmic[df.cosmic$Tumor %in% "HNSC" & abs(df.cosmic$log2FC.SCNAvsDip) >= 1 & df.cosmic$FDR.SCNAvsDip <= 0.01, ]$Gene_Symbol), ] gr.assoc.lusc.cosmic <- gr[gr$hgnc_symbol %in% as.character(df.cosmic[df.cosmic$Tumor %in% "LUSC" & abs(df.cosmic$log2FC.SCNAvsDip) >= 1 & df.cosmic$FDR.SCNAvsDip <= 0.01, ]$Gene_Symbol), ] both.cosmic <- intersect(gr.assoc.hnsc.cosmic$hgnc_symbol, gr.assoc.lusc.cosmic$hgnc_symbol) chromPlot(gaps=hg_gap, annot1 = gr[gr$hgnc_symbol %in% both, ], annot2 = gr[gr$hgnc_symbol %in% both.cosmic, ], chr = c(3,5,8), chrSide=c(-1,1,1,1,1,1,1,1))
CiberAMP package provides a ggplot-based function to create a scatter plot where:
In the end, a significantly altered gene can be classified in 8 scenarios:
Let's take a look on those 67 resulting genes:
library(ggplot2) df.exp <- x[[1]] df.exp <- df.exp[df.exp$Gene_Symbol %in% both, ] ggplot.CiberAMP(df.exp)
Finally, CiberAMP provides a function to interactively explore your results. In this app, you can filter genes by:
Clicking on any dot will display a table with the overlapping % of samples in which a SCN-deregulated queried genes co-occurs with any CGC oncodrivers' SCNAs.
Tip: Click on the 'submit' button to initialize the graph!! If 'ALL' is written in the box, then every SCN-altered queried gene from your results will be plotted.
library(shiny) library(plotly) library(DT) int.plot.CiberAMP(df.exp, x[[3]])
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.